Genetic Distance Measure for K-modes Algorithm

نویسندگان

  • Ching-San Chiang
  • Shu-Chuan Chu
  • Yi-Chih Hsin
  • Ming-Hui Wang
  • M. H. WANG
چکیده

K-means algorithm has been shown to be an effective and efficient algorithm for clustering. However, the k-means algorithm is developed for numerical data only. It is not suitable for the clustering of non-numerical data. K-modes algorithm has been developed for clustering categorical objects by extending from the k-means algorithm. However, no one applies this technique for classification of categorical data. In this paper, the k-modes algorithm is introduced for the classification of categorical objects based on Soybean and Nursery databases. Especially, a genetic algorithm is proposed for designing the dissimilarity measure termed Genetic Distance Measure (GDM) such that the performance of the K-modes algorithm may be improved by 10% and 76% for Soybean and Nursery databases compared with the conventional k-modes algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Optimization K-Modes Clustering Algorithm with Elephant Herding Optimization Algorithm for Crime Clustering

The detection and prevention of crime, in the past few decades, required several years of research and analysis. However, today, thanks to smart systems based on data mining techniques, it is possible to detect and prevent crime in a considerably less time. Classification and clustering-based smart techniques can classify and cluster the crime-related samples. The most important factor in the c...

متن کامل

Feature Selection Using Multi Objective Genetic Algorithm with Support Vector Machine

Different approaches have been proposed for feature selection to obtain suitable features subset among all features. These methods search feature space for feature subsets which satisfies some criteria or optimizes several objective functions. The objective functions are divided into two main groups: filter and wrapper methods.  In filter methods, features subsets are selected due to some measu...

متن کامل

Degree of Optimality as a Measure of Distance of Power System Operation from Optimal Operation

This paper presents an algorithm based on inter-solutions of having scheduled electricity generation resources and the fuzzy logic as a sublimation tool of outcomes obtained from the schedule inter-solutions. The goal of the algorithm is to bridge the conflicts between minimal cost and other aspects of generation. In the past, the optimal scheduling of electricity generation resources has been ...

متن کامل

K Modes Clustering Algorithm Based on a New Distance Measure

T he leading par tit ional clustering technique, K Modes, is one of the most computationally eff icient clustering methods fo r categ orical data. In the t raditional K Modes algo rithm, the simple matching dissim ilarity measure is used to compute the distance betw een two values of the same catego rical at t ributes. T his compares tw o categorical v alues directly and results in either a dif...

متن کامل

Extension of K-Modes Algorithm for Generating Clusters Automatically

K-Modes is an eminent algorithm for clustering data set with categorical attributes. This algorithm is famous for its simplicity and speed. The KModes is an extension of the K-Means algorithm for categorical data. Since K-Modes is used for categorical data so ‘Simple Matching Dissimilarity’ measure is used instead of Euclidean distance and the ‘Modes’ of clusters are used instead of ‘Means’. Ho...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005